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Summary 

A  new  approach  to  the  mechanical  analysis  of  English  Is  outlined. 
Essentially  It  Is  a  technique  of  analysis  by  successlvs  approxiaatlon.  It 
uses  a  more  or  less  conventional  dictionary  giving  form  elsss  (part  of  speech) 
asslgnmentj  or  assignments,  to  each  item*  With  some  exceptions.  In  each  case 
where  a  word  Is  assigned  to  more  than  one  form  class  It  enters  a  routine  to 
determine  the  class  to  which  It  belongs  In  the  particular  case  under  snalysls* 
These  routines  involve  the  examination  of  the  immediate  envlroment  of  the 
ambiguous  Item  and  are  based  on  a  close  study  of  groiq>  structure  in  English* 

It  is  freely  admitted  that  all  such  ambiguities  cannot  be  correctly  resolved 
In  this  wayi  though  a  surprisingly  large  nuinber  can*  For  this  reason  a 
strategy  was  adopted  whereby  these  preliminary  routines  were  designed  not 
merely  to  produce  the  greatest  number  of  right  solutions  but  to  reduce 
mistakes  as  far  as  possible  to  those  incorrect  Solutions  irtilch  can  be 
recognized  and  corrected  at  a  later  stage  In  the  analysis*  The  routines  are 
constructed  In  such  a  way  that  a  wrong  assignment  at  the  word  level  usually 
msans  that  results  will  be  produced  by  the  subsequent  clause  analysis  routines 
which  are  manifestly  wrong*  All  the  information  oonoeming  each  word  Is 
preserved  so  that  the  most  likely  candidate  for  reassignment  can  be  chosen 
in  these  oases  and  the  olatise  routine  reapplied*  This  Is  called  conditional 
re«»entry.  It  is  continued  until  a  legitimate  result  is  produced*  Preliminary 
tests  Indicate  that  a  very  high  percentage  of  correct  analyses  can  be  obtained 
in  this  way* 
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Summary 

A  MW  approadi  to  tha  mechanloal  analysis  of  Englldi  Is  outlined. 
Essentially  It  Is  a  technique  of  analysis  successive  approximation.  It 
uses  a  more  or  less  conventional  dictionary  giving  form  class  (part  of  speech) 
assignment^  or  assignments^  to  each  item.  With  some  exceptions^  in  each  case 
where  a  word  is  assigned  to  more  than  one  form  class  it  enters  a  routlM  to 
determine  the  class  to  which  it  belongs  in  the  particular  case  under  analysis. 
These  routines  Involve  the  examination  of  the  immediate  environment  of  the 
ambiguous  item  and  are  based  on  a  close  study  of  group  structure  in  English. 

It  is  freely  admitted  that  all  such  ambiguities  cannot  be  correctly  resolved 
in  this  wayi  thouj^  a  surprisingly  large  number  can.  For  this  reason  a 
strategy  was  adopted  whereby  these  preliminary  routlMS  were  desigMd  not 
merely  to  produce  the  greatest  number  of  right  solutions  but  to  reduce 
mistakes  as  far  as  possible  to  those  incorrect  Solutions  irtilch  can  be 
recognised  and  corrected  at  a  later  stage  in  the  analysis.  The  routines  are 
constructed  in  such  a  way  that  a  wrong  assignment  at  the  word  level  usually 
means  that  results  will  be  produced  by  the  subsequent  clause  analysis  routines 
idilch  are  manifestly  wrong.  All  the  information  concerning  each  word  is 
preserved  so  that  the  most  likely  candidate  for  reassignment  can  be  ohosen 
in  these  oases  and  the  clause  routine  reapplied.  This  is  called  conditional 
re»entry.  It  is  continued  until  a  legitimate  result  is  produced.  Preliminary 
tests  indicate  that  a  very  high  percentage  of  correct  analyses  can  be  obtained 
in  this  way. 
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Introduetion 

The  fact  of  the  pervaelveneee  of  homophony  in  a  language  as  morpho* 
logically  poor  as  En;^llsh  dose  not  need  to  be  lahouredt  though  It  is  un¬ 
likely  that  anyone  viho  has  not  read  through  a  few  pages  of  text  deliberately 
looking  for  this  feature  has  any  idea  just  how  widespread  it  is*  It  presents 
a  formidable  barrier  to  any  preliminary  proossslng  of  data,  whether  for 
purposes  of  infomatlon  retrieval  or  machine  tranalatlon.  It  was  decided 
that  attention  should  be  concentrated  upon  those  ambiguities  Involving  the 
possible  assignment  verb*  This  meant  the  eonstructlon  of  five  separate  sets 
of  rules  for  the  resolution  of  the  ambiguities}  noun/verb  present  tense 
(pointj  stage*  face  etc*)  adjective/verb  present  tense  (clean*  complete* 
close  etc*)  noMn/yrwh  past  participle  (and  somtimes  present  tense  also) 

(cut*  set*  felt*  thouffet)  verb  past  participle/adjective  (fixed*  interested* 
given  etc*)  and  verb  present  partlciple/adjective/noun  (meaning*  using* 
running  etc*)*  Subsequently  it  was  decided  to  add  a  sixth  set  of  rules,  in 
which  the  aidbiguitles  involved  in  words  of  such  idiosyncratic  distribution 
as  like*  except*  might*  can*  will*  even*  still*  well*  axxi  a  few  others,  would 
be  resolved*  A  computer  routine  for  recognising  inflectional  affixes  was 
already  in  operation*^ 

The  ipproaoh  followed  derives  from  an  idea  put  forward  by  Or*  M*A*K* 
Halliday^  and  called  by  him  shunting t  though  Halliday  intends  this  as  a 
general  ipproaoh  to  machine  translation  in  all  languages  and  does  not  speoillir 
the  corrective  potential  that  can  be  built  into  the  procedure,  particularly 
idien  dealing  with  languages  like  English*  Basle  to  this  procedure  is  the 

^ee  J*  lyons.  Fifth  Quarterly  Report  on  Automatic  Language  Analysis* 

^ee  M*A*K*  Halliday,  'Linguistics  and  Machine  lyanslatim',  Zeltechrift 
^  ^onetik  und  allgswsine  Spraohwissenshaft  (University  of  Berlin) 
rforthoomlng)* 
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ooncept  of  lovela  or  rankt#  Language  Is  regarded  as  being  built  up  of  a 
hierarohy  of  units  each  one  forming  the  elements  in  the  structure  of  the  unit 
of  the  level  next  above  it*  Thus  moridiemes  are  the  stjmctural  elements  of 
the  unit  wordf  words  make  up  groups,  groups  clauses,  and  clauses  sentences* 

In  the  prooedure  here  described  no  attention  was  paid  to  the  smallest  unit, 
aorj^ems}  nor  was  the  possibility  of  a  unit  still  hlc^er  than  the  sentence 
(paragrigih)  considered* 

Word  Ambiguity  Routine  Construction 

It  was  decided  in  the  first  place  to  see  how  suoeessful  we  could  be  in 
an  attenq)t  to  construct  diagnostic  procedwes  involving  the  examination  of 
each  word  in  the  sentence  in  turn,  reading  from  left  to  right,  resolving 
each  ambiguity  as  it  arises*  This  not  only  limited  us  to  the  investigation 
of  the  immediate  environment  of  each  word  (few  rules  Involve  a  search  of  more 
than  three  words  to  the  left  or  right)  but  also  meant  that  in  most  oases  the 
significant  part  of  the  envlronnrant  is  restricted  to  the  left  hand  side,  since 
the  right  so  often  contains  only  information  that  is  Itself  ambiguous*  In 
praetioe  it  turned  out  that  these  limitations  are  much  less  serious  than 
might  be  8\q;>posed*  The  ooag>en8ating  gain  in  simplicity  in  programming  is 
considerable*  (A  detailed  account  of  the  programming  of  these  rules  is 
given  below*)  The  decision  to  admit  only  such  simple  rules  was  reinforced 
by  the  discovery  that  Information  from  wider  envlronnents  could  be  much  more 
efficiently  used  after  division  into  tentative  clauses  had  been  coaqpleted* 

It  is  interesting  that  although  many  diagnostic  features  are  fairly 
obvious  some  quite  powerful  ones  were  not  racognlzed  until  after  a  great 
deal  of  patient  investigation*  For  exaiqple,  the  usefulness  of  the  fact  that 
plural  nouns  cannot  occupy  the  same  place  in  structure  as  adjectives  (possible 
exceptions  are  goods  train  and  brains  trust)  was  not  realised  at  first*  The 
importance  of  certain  sub-olassifloations  (Information  regarding  the  sub- 


classes  of  each  of  the  major  form  classes  Is  also  given  in  the  dictionary) 
particularly  countable  noun  and  two  object  verb  also  became  clearer  as  the 
work  proceeded*  On  the  other  hand  it  was  realised  from  the  very  beginning 
that  the  distribution  of  adverbs  in  English  is  too  haphasard  to  be  of  use  in 
this  respect^  and  most  rules  Involve  the  instruction  not  to  count  adverbs  as 
part  of  the  environment* 

The  most  significant  development  in  the  rules,  however,  came  after  the 
notion  of  conditional  re-entry  had  been  evolved*  A  simple  illustration  of 
how  this  affects  the  word  aidbigulty  rules  is  afforded  by  the  case  where  in 
the  resolution  of  a  past  partlciple/adjeotlve  ambiguity,  the  only  diagnostic 
feature  in  the  environment  is  the  occurrence  of  the  word  which  Immediately 
before  the  ambiguous  item*  Here  ^e  instructions  are  to  take  the  item  as  an 
adjective}  since  either  this  is  right,  or,  if  it  is  wrong,  then  a  clause 
without  a  verb  will  be  produced  and  an  error  registered*  It  should  perhaps 
be  esgbasised  that  better  results  could  be  obtained  from  these  sets  of  rules 
alone  if  these  considerations  were  not  taken  into  account}  but  the  resultant 
gain  by  the  subsequent  correction  of  errors  would,  of  course,  be  forfeited* 

These  routines  are  now  oonqiilete  and  are  currently  beii^  programmed  for 
the  I.B.M*  709  computer*^  The  noMn/yxh  present  tense  routine  was  programmed 
and  tested  on  the  6$0*  Eadi  program  contains  about  a  thousand  instruc¬ 

tions*  It  is  certain  that  we  have  not  analyzed  all  the  diagnostic  envlron- 
msnts  but  it  is  hoped  that  we  have  included  all  the  most  productive  cnss* 
Presumably  some  further  isqirovflments  can  be  ejected  from  the  examination  of 
the  results  of  extmded  runs  of  these  routines* 

Clause  Analysis  Routines 

Work  on  the  construction  of  routines  for  tentative  clause  analysis  is 
well  advanced  but  far  from  ooi^lete*  For  this  purpose  a  clause  is  defined 

^These  routines  were  developed  by  Henrietta  Chen,  Helga  Felder, 

Patricia  Huffman,  Tokuichiro  Matsuda,  and  Bruce  Moore* 


aa  that  part  of  a  sentence  containing  one  and  only  one  verb  (modal  vezbs 
excepted)*  Hence  the  decision  to  concentrate  on  verb  ambiguities*  It  has 
been  established  ^hat  clause  analysis  is  most  successfully  performed  by 
starting  at  the  end  of  the  sentence  and  working  towards  the  beginning*  The 
explanation  for  this  seems  to  rest  on  the  facts  that  the  typical  English 
clause  reveals  greater  oosplexity  in  the  post-verbal  part  than  the  i»re-verbal 
part|  and  that  the  beginnings  of  English  clauses  are  better  narked  than  the 
ends  (if  these  can  be  said  to  be  narked  at  all)*  Work  has  been  directed 
towards  compiling  lists  of  items  at  which  a  break  always  occurs  (a  much 
longer  list  than  can  be  derived  merely  by  looking  at  traditional  grammar 
books)  and  those  at  idilch  a  break  can  be  made  if  a  verb  has  previously 
occurred*  Rules  for  deciding  where  to  break  tdien  the  only  evidence  that  a 
clause  boundary  has  been  crossed  is  the  occurrence  of  a  second  verb  gre  also 
being  devised*  Here  again  it  is  obviously  iBg>ossible  to  produce  correct 
analyses  by  the  use  of  a  single  procedure*  Certain  corrective  devices  have 
already  suggested  thenselves|  notably  ones  involving  counting  oonnas  within 
tentative  clauses* 

Conditional  Re-entry 

Work  on  this  phase  obviously  awaits  the  oonqjletion  of  the  rules  for 
clause  analysis^  but  there  seems  to  be  no  doubt  that  the  j^enomenon  we  are 
reporting  does  regularly  occur  (idiich  is  not  to  say  that  the  problems  of 
recogxilsing  it  when  it  occurs  have  been  solved)*  A  fairly  Arequent  ease  is 
the  analysis  generated  for  a  one  clause  sentence  in  idilch  no  element  is 
assigned  as  a  verb*  (Unfortunately  it  has  to  be  admitted  that  any  writer  is 
likely  veiy  occasionally  to  produce  a  "sentence"  without  a  verb*)  But 
correction  by  conditional  re-entry  is  far  from  being  limited  to  one  clause 
sentences  or  complex  sentences  wnei^  the  clauses  are  defined  by  overt  markers 
as  the  following  typical  example  shows  t 
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One  of  the  most  satis  factory  laboratory  e^rlwants  In  the  field  of 
meohanlos/ls  the  measursnent  of  surface  tmsion  ^ly  means  of  a  DuNouy 
tensiometer* 

Hie  word  underlined  Is  wrongly  assigned  as  a  verb  by  the  word  anblgulty 

routines  resulting  In  the  sentence  being  analysed  Into  two  clauses  as 

Indicated*  But  such  a  Juxtaposition  of  clauses  In  English  Is  Impossible* 

It  Is  particularly  encouraging  that  the  quite  firequent  errors  produced 

when  the  left  hand  side  of  the  environment  oonslsts  of  a  conjunction  such  as 

and  or  but  which  can  link  either  two  clauses^  two  grovqps,  or  two  W9rds«  are 

especially  easy  to  spot  at  the  clause  level*  In  the  following  sentenoei 

It  Is  only  at  certain  times  of  day  and  certain  times  of  year/that  he 
In  fact  sueceeda/ln  observing  bands/^  and  further  davelqpnents  in 
teohnlque/are  therefore  required* 

the  wrong  assignment  of  farther  produces  the  incorrect  analysis  shown^  which 
is  recognizable  as  sudi  not  only  by  the  impossible  Juxtaposition  of  the  last 
two  olausesi  but  by  the  diange  In  number  without  ajqy  Intervening  mark  of 
subordixuitlan* 

Infomation  concerning  the  history  of  the  analysis  of  eadi  word  including 
the  number  of  the  rule  by  whidi  Its  aahlguous  asslgxsasnt  was  resolved  will 
be  carried  forward  in  the  output  of  the  word«aablguity  routines*  Hie  rules 
of  eaoh  routine  are  ordered  according  to  their  efficiency*  In  a  ease  where 
there  are  two  or  more  candidates  for  reassignment  the  word  with  the  lowest 
rule  number  is  reassigned  first* 

It  will  be  seen  that  conditional  re-entry  Is  a  very  simple  feedback 
device*  Hie  output  of  the  clause  analysis  routine  Itself  produces  changes 
In  the  Izqiut  (l*e*  the  results  of  the  word  ambiguity  routines)* 

Hie  Dlctionaxy 

As  the  work  has  progressed  it  has  become  clear  that  certain  extensive 
simplifications  can  be  Introduced  in  the  handling  of  the  Information  supplied 
by  the  dictionary*  For  exasgile  any  word  entered  as  a  preposition  can,  for 


dlagnostlo  purposes^  always  be  treated  as  if  it  were  only  a  preposition  no 
natter  idiat  other  claasee  it  can  also  belong  to«  A  wwd  like  since  oan 
always  be  treated  as  preposition  in  the  resolution  of  word  ambiguities*  The 
decision  as  to  idiether  since  Itself  is*  in  the  particular  case^  a  preposition 
or  a  conjunction  depends  on  the  examination  of  a  much  wider  context  and  is 
postponed  until  the  process  of  setting  up  tentative  clauses*  In  fact  in  most 
oases  a  elear<^ut  distinction  can  be  made  between  the  information  that  must 
be  obtained  from  the  dlctlonaiy  and  the  Infoi’ination  that  must  be  obtained  by 
analysis*  The  fact  that  in  nearly  all  Instances  both  kinds  of  information 
are  actually  given  in  the  dictionary  tends  to  be  merely  misleading* 

The  ambiguity  routines  work  more  efficiently  if  certain  phrases^ 
particularly  those  prepositional  phrases  such  as  of  course  and  in  fact*  which 
occupy  the  same  place  in  structure  as  adverbs  are  entered  in  the  dictionary 
as  single  itsns*  This  affords  the  opportunity  for  introducing  other  devices 
for  Inqproving  the  working  of  the  routines*  For  exanq>le*  if  the  other*  others 
and  other  are  entered  separately*  the  second  can  be  taken  solely  as  a  pronoun 
and  the  third  solely  as  an  adjective  •i-  both  potent  diagnostic  features* 
Similarly  small  need  only  be  entered  as  an  adjective  since  in  the  phrase 
the  small  of  the  back  it  occupies  a  place  in  structure  vdiich  can  only  be 
filled  by  nouns*  There  are  numerous  other  cases* 

The  last  example  raises  a  more  general  point*  The  ambiguous  classifica¬ 
tion  noun/ad Jectlve  does  not  occtu*  in  the  dictionary  at  all  since  this  can 
only  ever  be  resolved  by  the  analysis  of  structure*  This*  like  the  other 
aid^igultles  hitherto  neglected*  adjectlve/prounoun  and  adjectlve/adverb  is 
among  the  last  to  be  handled  —  in  routines  for  building  vp  groups  within 
clauses*  The  final  decision  regarding  the  assignment  of  past  particlf^e 
preceded  by  part  of  to  have  and  present  partlc^le  preceded  by  part  of  to  be* 
idiid)  up  to  this  point  have  been  treated  as  adjectives*  as  either  adjectives 


or  non-flnite  verbs  Is  also  most  oonvenientl/  made  at  this  point*  These  are 
the  simplest  of  all  the  routines  in  spite  of  the  fact  that  ambiguities  are 
z^solved  and  group  boundaries  establidhed  simultaneously*  The  decision  as  to 
fdiich  is  structurally  a  noun  and  which  an  adjective  in  a  string  of  idiat  by 
dictionary  look-up  or  ambiguity  routines  have  been  assigned  as  nouns  and 
adjectives  involves  little  more  than  counting*  This  has  interesting  inplioa- 
tlons*  English  has  been  typologically  classified  as  a  langue  gro\n>ante* 
Certainly  the  greatest  structural  conqplexlty  is  found  at  the  group  level*  It 
is  perhaps  worth  en^aslzlng  therefore  that  approached  in  this  way  group 
analysis  becomes  the  easiest  part  of  the  whole  task* 

n 

Computer  Techitique  for  Resolution  of 
Woi^  Amblgulttes 


Introduction 

The  counter  technique  for  resolving  form  class  ambiguities  revolves 
about  the  application  of  the  rules*  developed  by  the  linguists*  to  the  words 
in  a  sentence*  Although  the  application  of  these  rules  is  the  core  of  the 
problem  and  the  main  program  in  the  eonputer  technique*  it  by  no  means 
represents  the  total  computer  problem*  This  is  essentially  a  data  processing 
problem*  It  is  a  system  of  many  programs  ^ioh  manipulate  the  text*  extracting 
significant  facts  and  building  the  files  of  infoonnation  which  feed  the  main 
computer  program* 

An  experiment*  conducted  on  the  IBM  650  computer*  tested  a  technique 
for  rule  application*  Only  the  set  of  noun/verb  present  tense  rules  was 
used  in  this  experiment*  Input  data  was  simulated  for  the  experiment  since 
programs  for  information  file  development  were  not  available*  The  technique 
required  two  computer  prograrosi  one  composed  of  approximately  500  instructions! 
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and  the  other^  of  over  1000  instructions  •  Since  the  results  of  the  experiment 
proved  satisfactory  the  same  technique  has  been  employed  in  the  main  program 
of  the  IBM  709  data  processing  system  currently  under  development* 

The  main  computer  program  of  the  IBM  709  system  involves  the  application 
of  the  six  sets  of  rules  mentioned  earlier  in  this  report*  Each  rule  set  is 
being  written  in  subroutine  form  to  be  called  upon  by  a  major  control  program* 
This  program,  with  its  subroutines,  will  be  described  in  detail  in  a  later 
section*  The  preparation  of  this  portion  of  the  system  is  being  carried  out 
by  five  progranmers*^  At  the  same  time  these  programmers,  and  a  slxth,^  are 
writing  the  auxiliary  programs  required  to  build  the  information  files  for 
the  main  program  of  the  system*  All  IBM  709  programs  are  coded  in  the  FAP 
language^  and  make  use  of  the  lOCS^* 

The  remainder  of  this  report  describes  the  data  processing  system  for 
resolving  part-of«-speeah  ambiguities*  A  block  diagram  of  the  system  is  given 
in  Appendix  C* 

Data  Preparation 

The  raw  material  used  in  the  cmnputer  solution  to  the  word  ambiguities 
problem  can  be  divided  into  two  categories  t  dictionary  data,  and  scientific 
text  data* 


Dictionary  Data 

The  formsr  category  was  prepared  as  a  tape  file  for  use  with  the  IBM  650 
computer  programs*  This  dictionary  tape  file  has  been  converted  by  an  UK  7090 


Wa 


Bums, 


Ray  Cook,  Kay  Estes,  Jerrnne  Wenker,  and  S*  S*  Varma* 


^Josei^  Buckley* 

^709/7090  Processing  Systwi  Bulletin  J28-6098-1,  7/6l,  Fortran  Assembly 
Program  (FAP)  for  the  IBM  709/7090* 

'IBM  Reference  Manual  C28-6100-1,  709/7090  Input/Output  Control  System* 


routine^  to  a  format  which  Is  sidtable  for  the  University's  present  eonputerSy 
the  IBM  lUOl  and  709*  At  the  time  of  Its  eonverslonf  the  dictionary  file  was 
modified  both  in  content  and  In  structure  to  Its  present  format*  (See 
Appendix  File  1, )  The  preparation  of  this  category  of  data  Is,  therefore, 
conplete*  Additions,  deletions  or  changes  to  the  file  will  hereafter  be  made 
throuf^  the  updating  program  which  will  be  described  in  a  later  section* 

Text  Data 

The  Initial  scientific  text.  Planet  Erfthj^  was  also  prepared  as  a  tape 

file  for  use  with  the  IBM  650  Computer  programs  and  It  also  has  been  converted 

0  _ 

by  the  above-mentioned  rou1d.nB  •  (See  Appendix  B,  File  2*)  A  special  IBM  709 

program  is  being  written  which  will  be  used  for  assigning  to  words  of  this  text 
only  the  text  Identification  described  below* 

The  preparation  of  scientific  text  data  Is  a  continuous  part  of  the 
data  processing  system  developed  for  this  project*  A  description  of  the 
manner  In  which  this  data  Is  prepared  follows* 

Text  data  preparation  Is  carried  out  In  two  stages t  the  conversion  of 
the  printed  material  to  punched  cards;  and  the  ctxiverslon  of  the  punched 
cards  to  tape  files* 

Text-to-card 

The  text  Is  punched  (See  Appendix  A,  Format  2*)  in  much  the  same  way  as 
It  Is  typed*^^  The  following  conventions  are  observed  In  the  key-punch 
operation* 


0 

D*  E*  Flanigan,  IBM,  Tape  Conversion  Routine  for  the  IBM  7090* 
^Planet  Earth*  Karl  Stumpff  (Am  Arbor,  19^9)* 


^^tles,  subtitles,  and  questions  are  not  pundied* 


1)  Bvery  8«ntenoe  b«elxi8  %  new  «*rd« 

2)  The  pvsiehlng  format  for  the  text  portion  of  the  card  is  represented 

F  (Pj^)  W  (Pg)  S  where 

F  ie  the  format  of  the  vord^  l«e*y  upper  oaee^  Itallosy  ete«} 

P^  is  the  punctuation  \iiieh  precedes  the  word,  l*e«,  quotes,  eto*) 

W  Is  the  word  Itself  as  it  ai^ars  in  the  printed  material^} 

Pg  la  the  punctuation  ^Idi  follows  the  word,  i»e«,  oonna,  dash, 
quotes,  etc*}  (Pg  nay  occupy  senreral  card  columns*) 

S  is  a  blank  column  ^ioh  designates  the  end  of  the  word  in  text* 

3)  A  sentence  nay  be  continuous  orer  many  cards*  IfQxen  one->les8-than- 
the^olumn-limit  per  card  is  reached  within  the  middle  of  a  word  of  the  text, 
a  hyphen  la  inserted  and  the  word  is  ccxitinued  on  the  next  card* 

It)  Each  punched  card  contains  a  ten-digit  Identification  number*  This 
number  serves  a  dual  purpose*  It  maintains  the  proper  card  sequence,  and  it 
provides  a  means  of  locating  the  soitenoe  within  the  teoct*  The  identification 
number  assigned  in  the  key-punch  operation  is  divided  as  follows  i  digits  1-3 
represent  the  page  of  text;  digit  U  represents  the  paragraph  on  the  page; 
digits  5-6  represent  the  sentence  number  within  the  paragraph;  and  digits 
7-10  are  zeros*  (A  page  number  does  not  change  unless  there  is  also  a 
paragraph  change*  This  means  that  when  a  paragraph  is  continuous  from  one 
page  to  the  next,  the  page  number  on  which  the  paragraph  originates  is  the 
page  number  in  the  identification  number  for  all  sentences  within  that 
paragraph* ) 

Card-to-T^M 

The  primary  purpose  of  the  card-to-tape  conversion  operation  is  to 
provide  an  efficient  form  of  iiput  to  the  data  processing  system*  Its 

^uamrals  and  symbols  are  punched  as  NMN  and  SSS,  respective!)^  in  this 
field* 


secondary  function  Is  to  refine  the  identification  number  whi«.w  aggigned 
during  the  key«punoh  operation* 

For  the  purpose  of  resolring  part->of>speeoh  ambiguities  in  sentenoeSf 
certain  boundaries  are  recognized  within  the  ultimate  bound  of  the  sentence 
itself*  Some  can  be  readily  identified  by  punctuation  marks*  Dashes  and 
parentheses^  far  exanple^  are  obvious  8Fnft>ols  of  breaks  within  the  sentence* 
The  text  identification  number  assigned  during  the  eard»to-tape  operation 
makes  possible  the  division  of  the  sentence  into  its  "analyzable  units*” 

The  format  of  the  10-digit  text  identification  number  ist 

PPP^SSCIMD  where 

FPP  is  the  page  number  in  the  text; 

Pis  the  paragraph  nunber  on  the  page} 

SS  is  the  sentence  number  within  the  paragra^l 
C  is  the  clause  within  the  sentence) 

U  is  the  unit  within  the  clause)  and 
WD  is  the  word  in  the  sentence* 

Each  word  of  the  text  is  converted  to  one  logical  record  on  tape  (See 

Appendix  B,  File  2),  and  is  identified  by  the  above  text  identifioatimi 
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nunber*  By  standard  IBM  sort  routines  the  text  file  nay  be  sorted  in  one 
of  two  sequences  depending  on  which  digits  of  the  teoct  identification  xtmbtr 
are  keys  of  the  sort*  In  the  first  exas|>le  which  follows^  the  keys  are 
digits  1-6  and  9-10*  This  seqxiences  the  text  file  —  one  sentence  in  this 
case  —  into  original  text  order*  In  the  second  example^  the  sort  keys  are 
digits  1-10*  This  sequences  Ute  file  into  "analysable  unit”  order* 

^^IBM  Reference  Manual  C26-6036«  1959*  aEMERAUZED  SCR^^P  PROGRAM  FOR 
THE  IBM  709  DATA  PROCESSIMQ  SISTSM  SORT  709* 


the  bojr  I  eair  him  tram  the 

ID  oououool  00110U002  00U011103  oolioiiioi*  OOUO11105  oouomo6  oouoino? 

window  went  to  the  store}  the  girl 

ID  0011031108  00110U009  0011011010  OOUOUOll  001101  12  00U012013  00U01201ii 

went  to  the  movies  • 

ID  0011012015  0011012016  0011012017  00U012018 

Example  1 

The  bogr  I  saw  him  from  the  window  went  to  the  store  the  girl  went  to  the  movies*^ 
Exaaple  2 

I- 

The  boy  went  to  the  store  I  saw  him  from  the  window  the  girl  want  to  the  movies.  ' 
File  Standards 

For  purposes  of  uniformity  within  the  data  processing  S3r8tem«  all  files 
conform  to  the  following  standards: 

1)  Tapes  are  written  at  low  density} 

2)  Logical  record  length  within  any  file  is  fixed} 

3)  Block  (phj'sical  tape  record)  length  within  any  file  is  fixed} 

U)  Data  files  are  all  BCD  files} 

5)  In  addition  to  its  data  blocks^  each  data  file  contains  one  header 
record  and  one  trailer  record  as  described  in  the  IOCS  manual.^ 

6)  Each  blank  tape  mounted  for  use  as  output  of  the  709  oonputer 
programs  contains  the  Blank  Tape  Label  described  in  the  IOCS  Hanual.^^ 

File  Maintenance 

The  data  processing  sjrstem  for  resolving  word  ambiguities  requires  two 
types  of  maintenance  programst  sort  programs  and  update  programs. 


^iace  the  punctuation  ihlch  determines  the  units  is  no  longer  needed 
for  resolution  of  word  ambiguities,  it  is  omitted  trm  the  tape  record  at 
oard-to-tape  conversion.  The  text  identification  niai>er  may  be  used  to  re¬ 
construct  this  punctuation  if  it  is  later  considered  important.  All  other 
sentence  punctuation  is  converted  to  tape. 


^IBM  Reference  Manual  C28-6100-1,  709/7090  IMPUT/OUTPUT  OONraOL  SYSTBl. 

pp. 

^Ibld..  p.  21« 
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Sorting  Is  carried  out  by  means  of  the  standard  IBM  70?  Sort  •  Sort 
sequences  are  designated  both  In  the  file  descriptions  (Appendix  B)  and  In 
the  blook  diagrams  (Appendix  C)« 

Updating  of  all  files  of  the  system  may  be  carried  out  by  means  of  a 
single  IBM  70?  program  ^Ich  Is  currently  being  written*  Although  Intended 
originally  for  updating  the  dictionary  file  only^  this  program  has  been 
designed  as  a  multi-purpose  routine  for  updating  both  data  and  Instruction 
files*  In  addition  to  enabling  dianges  to  be  readily  Introduced  to  these 
tape  fUeSj  the  program  provides  a  means  of  examining  selected  parts  of  the 
files*  contents* 

Update  Program 

The  particular  function  which  the  program  performs  at  any  one  time 
depends  upon  the  specifications  stated  by  the  user*  These  speclfloatlons 
Include  a  general  description  of  the  characteristics  of  the  file  to  be 
modified  and  of  the  manner  of  modification  (i^pendlx  k,  fomata  3  &  5)»  end 
a  detailed  description  of  each  type  of  modification  (Appendix  k.  Formats  b  &  6)* 

Specifications  are  punched  on  cards  and  converted  to  tape  by  a  standard 
IBM  IbOl  program  In  which  each  card  becomes  one  fourteen^ora  record*  (See 
Appendix  B*  File  3)«  This  "change”  tape  and  the  tape  file  to  be  modified 
are  the  two  iiqiuts  to  the  file  update  program*  The  primary  output  of  the 
program  Is  the  modified  file*  A  secondary  output  file  can  be  produced  If 
requested  In  the  specif Icatione* 

Instrxtctlon  Files 

Instruction  files  are  updated  by  means  of  the  specification  cards 
and  4*6*  These  cards  direct  the  program  to  locate  a  blook  by  Its  position 
within  the  file  and  to  change  a  word  within  that  block*  There  may  be  any 
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number  of  alteretlone  vlthln  the  same  file  blockf  each  specified  by  a  distinct 
card 

Data  Files 

Data  files  are  updated  by  means  of  cards  #3  and  Depending  on  the 
use  of  the  parameters  in  these  cards  the  program  may  be  directed  toi 

1)  locate  a  logical  record(block)  by  its  position  in  the  flle^  and  add 
N  logical  reoords(bloeks )  Immediately  after  the  one  looated» 

The  aotual  records  (blocks)  added  follow  immediately  behind  the  specifica¬ 
tion  card  Each  of  the  N  records  (blocks)  must  begin  in  column  1  of  a 
card;  the  record(block)  miy  be  continued  fron  card  to  card  (78  columns  per 
card)  until  the  full  size  of  the  reoord(bloek)  has  been  punched*  (Continu¬ 
ation  cards  are  identified  by  a  "c”  in  column  79*)  Final  ^  padding  of 
records  (blocks)  need  not  be  punched*  Since  the  record  and  block  else  are 
both  stated  in  card  f/3*  a  fUll  record(block)  will  always  be  added* 

2)  locate  a  record(block)  by  its  position  in  the  file*  and  delete  — 
beginning  with  that  reoord(bloek)  or  the  following  one  N  records  (blocks)* 
No  records  (blocks)  need  follow  this  specification  card 

It  is  possible  in  updating  data  files  to  delete  certain  records (blocks) 
and  to  a^  others  at  the  same  point  in  the  file*  This  must  be  done  by  means 
of  two  q)eclfleation  cards  each  referring  to  the  reoord(blook)  after 
which  reoords(blockB )  are  to  be  added*  The  apeclfloation  card  with  the  "Add" 
parameter  and  the  records  (blocks)  to  be  added  must  precede  the  specification 
card  with  the  "Delete"  parameter* 

^^Dy  this  direction  each  specifloation  card  #1^  (6)  applies  to  a  single 
reeord(bloek)  in  the  file*  &nce  this  reoord(block)  is  located  and  modinoa- 
tlMi  made  the  specifloation  card  has  no  further  application*  This  type  of 
direction  assumes  the  "change"  file  to  be  in  the  same  order  as  the  file  to 
be  modified* 
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3)  looate^^  a  re6Qrd(bloek)  by  its  position,  and  change  that  reoord(blook) 

of  the  file  with  the  match  field  stated  on  the  specif loation  card  When 

a  record(blook)  ia  found  In  which  the  wo  fields  agree,  delate  or  add  N 

records (blocks),  or  change  that  reoord(block)  by  the  replacement  field  of  the 

i£ 

specification  card 

loeate^^  a  specified  field  In  every  xeeard(bloek)  of  the  file* 

Whenever  the  field  ipecified  matches  the  field  stated  on  the  specif loation 
card  #ti,  change  the  reoord(block)  by  the  replacement  field  of  the  speolfloa- 
tlon  card 

The  dictionary  file  (Sea  Appendix  B,  File  1)  may  be  updated  by  record  In 
any  of  the  above-mentioned  ways*  However,  In  addition,  the  logical zecords 
within  the  dictionary  file  idilch  contain  the  count  of  the  number  of  other 
records  within  an  alphabetic  groining  are  modified  In  accordance  vtth  the 
addition  or  deletion  of  records  during  updating* 

Data  files  may  also  be  examined  by  means  of  the  specification  cards  #3 
and  #U*  When  used  for  this  purpose,  the  specification  cards  direct  the 
program  to  locate  a  specified  field  in  every  reeord(block)  of  the  file* 

Whenever  the  record(block)  field  matches  the  field  specified  In  card  the 
reco]rd(blook)  Is  placed  on  the  secondary  output  file* 
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When  the  locating  is  carried  out  at  the  record  level  neither  the 
matching  field  nor  the  replacing  field  may  exceed  the  size  of  the  record* 

When  the  locating  Is  carried  out  at  the  block  level(block)  neither  ike 
matching  field  nor  the  replacing  field  may  exceed  the  size  of  the  block* 

The  match  field  must  always  be  completely  stated  on  the  specification  card  #Ul 
the  replacement  field  must  begin  on  ths  specification  card  but  may  continue 
from  card  to  card  until  it  is  conpletely  stated*  No  characters  —  including 
zeros  —  may  be  omitted  from  the  field  statement* 

“By  this  direction  all  specification  cards  #1^  are  applied  to  each 
record(bloek)  of  the  file*  This  makes  possible  a  change  which  is  u^versal 
to  all  records (blocks)  of  the  file,  or,  if  no  change  is  requested,  the 
selection  of  all  records (blocks)  thldi  have  some  oomnnn  characteristic* 
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All  epeelfieatlon  cards  ^  except  those  using  the  "Add"  parameter  majr 
request  secondary  output*  When  requested  in  connection  with  deletions*  the 
secondary  output  file  will  contain  all  reowds (blocks)  whldi  have  been 
deleted  from  the  modified  file* 

When  requested  In  connection  with  changes*  the  secondary  output  file 
will  contain  all  records  which  hqve  been  modified  on  the  primary  output  file* 

Infornatlon  Gathering 

With  data  files  preparedj  \:q)dated  and  maintained  in  various  sort  sequenoea^ 
the  next  stage  in  the  data  processing  flQrstaai  Is  one  of  Information  gathering* 

A  single  pro^amj  the  Affix-Dictionary  program^has  been  written  to  gather 
information  from  the  dlotlonary  file  and  append  it  to  the  text  file* 

The  Affix-Dlctlonaiy  Program  of  the  IBM  709  system  is  quite  different 
from  the  program  written  for  the  IBM  6^0  Cosgmter*^^  The  dlf farenoe  stems 
from  the  development  of  the  project's  aims  in  the  last  few  months*  The 
initial  program  was  intended  to  provide  the  lii^pilsts  with  a  step-by-step 
picture  of  form  class  derivation  by  affix  removal*  The  Affix-Dictionary 
program  of  the  709  system  was  designed  q>eolfioally  to  simidLify  the  task  of 
resolving  word  ambiguities*  For  this  reason  the  709  program  produces  only 
the  resultant  form  classes  and  that  information  regarding  then  which 
contributes  significantly  to  asjbiguity  resolution*  In  addition  to  the  basic 
change  in  purpose  between  the  two  programs  there  are  changes  based  on  the 
new  format  and  content  of  the  dictionary  file  described  earlier  in  this 
report* 

Affix-Dictlonaiy  Program 

Input  files  to  the  Affix-Dictionary  Program  are  twot  the  dlotionaiy 
file  (Appendix  B*  File  and  the  text  file  (Appendix  B,  File  #2)*  The 
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latt«r  is  In  alphabetic  word  sequence  to  enable  a  strai^tforward  matching  of 
the  tiro  files* 

Output  Arom  the  program  is  an  appended  text  file  (Appendix  File  #ti) 
and  an  Error  File  (Appendix  B*  File  The  Error  File  is  produced  when 

there  is  no  matching  dictionary  entry  for  a  word  of  the  text  or  for  its  stssi 
after  affix  remoral*  When  this  situation  arises*  additional  entries  must  be 
made  to  the  dictionary  file  by  means  of  the  Update  Program*  Ihe  Affix- 
Uictlonaxy  Progrem  mist  then  be  repeated  with  the  i^xlated  dictionary  file  as 
liqput* 

By  means  of  this  program  the  10-word  record  (Appendix  B*  File  #2) 
r^esentlng  a  word  of  text  increases  to  a  20-word  reoord*  The  Affix- 
Dictionary  Program  inserts  into  the  expanded  record  (Appendix  B*  File 
codes  r^esenting  all  the  form  classes  to  which  the  word  of  text  may  belong* 
A  code  indicating  a  preferred  form  class*  if  there  is  one*  is  also  added* 
Subolasslfioatlons  of  fom  classes  and  the  stem  (word  found  in  the  dictionary) 
are  inserted*  Coded  representations  of  those  affixes  removed  (prior  to 
locating  the  stem)  and  the  next  removeable  affix  are  included  in  the  expanded 
record* 

Although  more  Information  must  be  known  about  eaoh  text  word  before 
anblgultles  can  be  resolved*  the  remaining  information  can  be  gathered  in 
the  Resolution  of  Ambiguities  program  which  will  be  described  in  the  next 
section* 

Once  the  appended*  file  has  been  produced  it  is  sequenced  into  the 
analyzoble  unit  order  (Text  Identification)  in  order  that  the  aid>iguitieo 
may  be  resolved  sequentially  Aram  ^ft  to  right* 

Resolutiwi  of  Word  Ambiguities 

The  resolution  of  word  ambiguities  within  an  analyzoble  unit  is  carried 
out  on  the  IBM  709  Computer  in  a  manner  idilch  imitates  the  method  alloyed 
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by  the  linguist*  The  procedure  in  its  sin^lest  form  mi^t  be  stated  as 
follows t 

1)  Within  an  analyzable  unit  (left  to  right)^  note  all  words  idiich  are 
menbers  of  only  one  form  elass^  i«e.f  never  ambiguous| 

2)  Within  the  same  unit  (left  to  rlght)^  resolve  the  ambiguities  of 
certain  words  of  unusual  distribution^  as  mine  or  liket  (This  stqp  Involves 
the  applloation  of  one  of  a  group  of  rules  designed  specifically  for  Ihe 
resolution  of  ambiguities  of  words  in  this  class*  Examination  of  the 
Ijmmedlate  environment  of  the  words  is  required*) 

3)  Within  the  same  unit  (left  to  right)^  detennine  the  particular  type 
of  ambiguity*  such  as  noun/verb  present  tense  or  noun/verb  past  tense*  (The 
ambiguity  is  resolved  by  applying  a  suitable  rule  from  a  group  designed  for 
the  resolution  of  this  type  of  ambiguity*  The  application  of  the  rule 
requires  testing  the  inmedlate  environment  of  the  ambiguous  word*) 

The  program  written  to  follow  this  procedure  consists  of  a  control 
routine  (See  ^^endlx  D*  Control  Program)  and  a  group  of  subroutines* 

Control  Routine 

The  control  routine  reads  the  analyzable  unit  from  the  appended  text 
file  into  the  cosgiuter  memory*  noting  (step  1  above]^  as  it  reads^  the  words 
which  are  unambiguous*  For  each  of  these  words  the  control  routine  places 
the  appropriate  English  word*  i«e.*  NOUN,  into  a  specified  part  of  its 
msmory  record* 

Whan  the  unit  has  been  completely  read  into  the  memory*  the  control 
routine  begins  its  second  pass*  In  this  pass  it  determines  the  words 
belonging  to  the  class  defined  as  'Vords  of  unusual  distribution”  and 
transfers  to  a  subroutine  which  applies  rules  sequentially  until  it  resolves 
the  ambiguity  of  the  word*  The  subroutine  then  supplies  the  English  word 
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for  the  form  elea*  and  the  reeolving  rule  nundber  to  the  epeolfled  part  of  the 
memory  record  for  this  Mord  of  text*  Control  returns  again  to  the  oontrol 
routine  idileh  proceeds  In  this  fashion  until  it  again  reaches  the  end  cf  the 
units 

When  all  such  words  have  been  resolved^  the  control  routine  makes  its 
third  and  final  pass  through  the  units  In  this  pass  it  determines  the 
particular  type  of  ambiguity  of  all  other  andiiguous  words  and  transfers  to 
the  apprcqpriate  sii)routines  for  resolvli^  them*  These  subroutines  function 
in  the  manner  described  in  the  preceding  paragraph* 

Whan  all  words  of  the  unit  have  been  resolved^  the  control  program 
writes  the  unit  whose  records  (i^endix  B*  File  #6)  now  show  the  resolved 
firm  class  and  the  resolving  rule  nunber* 

nte  procedure  is  repeated  until  all  the  units  of  the  text  have  been 
analysed* 

Indicator  Subroutine 

emitted  from  the  above  description  of  the  oontrol  routine  is  reference 
to  a  special  s\ibroutli»«  the  Indicator  Subroutine*  (Appendix  T,  ihidioator 
Flow  Chart)*  This  subroutine  o(»pletes  the  information  gathering  prooeas 
begun  in  the  Affix-Dictionary  program  and  provides  the  oontrol  program  and 
the  r  emalning  subroutines  with  sufficient  facts  for  determining  the  type  of 
ambiguity  and  for  applying  the  ^>eoific  rxiles  for  its  resolution* 

Resolution  of  form-olass  ambiguities  depends  \qx>n  the  analysis  of 
regularly  recurring  environments  (indicator  situations)*  First  these 
indicator  situations  were  broken  down  into  parts|  oharaoteristios  of  the 
anbiguous  item  itself  (for  example^  being  the  first  word  in  the  sentence^ 
oi4)italisationf  etc*}*  characteristics  of  imnediately  preceding  words^ 
characteristics  of  preceding  word  *  1*  diaraeteristies  of  preceding  word 
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ignoring  non-preposltionAl  adverbs^  and  so  on*  Indicator  oatagories  vere 
than  sat  up  and  codes  ware  giran  to  each  of  thesa^  indicator  codas*  which 
oould  ba  used  in  the  nmohina*  For  aacanpla>  one  part  of  the  indicator  situ¬ 
ation  for  seraral  rules  among  ^a  six  rule  sets  is  the  presanoe  of  a  laodalf 
oopulacbiva,  or  auxiliary  verb*  For  one  situation  a  member  of  this  class  is 
required  to  ba  Inroadiately  in  firont  of  the  ambiguous  itang  for  another,  non- 
prepositional  adverbs  may  ba  ignored,  Agaln^  for  others  a  member  of  this 
class  must  Immadiataly  follow  the  aidbiguous  item,  in  one  case  immediately 
following  a  word  belonging  to  another  category  which  itself  follows  the 
ambiguous  Itonj  in  another  non-pnpoaitlonal  adverbs  may  be  ignored*  Finally 
in  one  rule  It  is  only  required  to  ba  the  next  verb*  Nevertheless  there  is 
only  one  Indioator  code  for  ^Is  class  -  a  "1”  in  the  10th  position  of  the 
first  indicator  word*  This  means  that  each  word  in  the  sentence  is  tested 
to  see  if  it  la  a  member  of  this  class,  and  if  it  is,  a  "1"  is  placed  in  the 
arbitrarily  determined  position  of  the  arbitrarily  chosen  indicator  wordf  ■ 
otharwlaa  there  is  a  **0”  in  this  position*  Since  this  class  is  a  subclass 
of  a  slightly  larger  indicator  category  a  "1"  would  ba  put  in  the  pradetarmlned 
place  indicating  that  the  word  is  a  menibar  of  this  larger  class  also*  Since 
both  modals  and  auxiliaries  are  themselves  members  of  other  Indioator 
oatagorias,  "indicators”  must  ba  placed  in  several  places  for  sucdi  an  item* 
Before  any  atteBg>t  is  made  to  resolve  the  verbal  ambiguities  ovary  word  in 
the  sentence  must  ba  tasted  to  see  if  it  fits  one  of  these  indicator  classes* 

On  first  oonaideration  the  task,  performed  by  the  control  routine,  of 
determining  the  type  of  ambiguity  of  a  word  seems  relatively  straightforward* 

It  can  be  shosn^  however,  to  be  quite  ooiig>leK,  involving  many  ocngmter 
instructions*  For  this  reason  the  control  program  transfers  to  the  Indioator 
Subroutine*  The  Indioator  Subroutine  performs  the  neoessaxy  tests  and 
classifies  each  word  by  inserting  a  mw^r  in  the  Ambiguous  Word  Code 


(Appondlx  File  #6)«  The  control  routine  need  then  make  only  a  single  bit 
test  to  determine  which  type  of  ambiguity  exists  and^  thereby,  detemlne 
which  subroutine  must  be  Entered, 

In  providing  sufficient  information  for  the  subroutines  to  operate 
efficiently,  the  Indlcat  r  Subroutine  is  even  more  valuable*  An  illustration 
of  a  rule  from  one  of  the  subroutines  will  serve  to  Illustrate  its  value* 


Has  it  got  to  Immediately  in  ftront  of  itf  If  yes,  see  if  the  ambiguous 
item  has  affix  "a**  If  yes,  take  as  a  noun*  If  no  see  if  the  preceding 
verb  is  one  of  the  group  of  words  ^long*  attri^to*  pwftain*  cling,  sttydi. 
convert,  ascribe,  reocnvert.  cowdi.  oSms.  r^ie.  Wbieot.  iead.  or  if 
the  word  Immedia^ly  Wore  te  is  the  sema  as  tiie  smbiguws  item,  or  if  this 
word  is  one  of  the  groiq;)  of  words  BropositionSl.  attributable,  as, 

subieot.  due,  oriw.  wslstant.  an^ 
eontra^.*n^llty«  ininioal.  heel  .  , 
amenable*  If  yes,  take  as  noun.  l!i!  no,* 

Obviously  the  rule  is  complex  even  in  the  number  of  questions  it  asks 
before  a  resolution  can  be  made.  Each  of  the  subroutines  has  between  20  and 
ho  rules  of  varying  complexity*  The  additional  cceplloation  of  determining. 


Broysitiooel.  attributable,  as. 

,0.  fcgejgnTaliMi.  repr<i.  respect. 
Ssmentii^.  equintiSt.  opposite. 


for  example,  if  the  word  in  qtiestion  is  a  member  of  one  of  the  groups  men¬ 
tioned  above  increases  the  rule's  complexity  from  a  programmixig  stand^lnt 
and  decreases  its  flexibility* 


Flexibility  is  a  most  i^xrtant  feature  of  this  entire  data  processing 
system*  It  is  especially  important  in  the  program  for  resolving  aBa>lguitieo* 
Ihere  have  bevn  changes  both  in  rules  and  rule  ordering*  There  have  also 
been  insertions,  deletions,  and  changes  in  word  groupings  sudi  as  the  groups 
underlined  in  the  above  illustration*  It  is  antic^ted  that  whan  the 
results  of  the  oosputer  programs  are  studied,  more  changes  are  inevitable* 

A  decision  was  made,  therefore,  that  in  order  to  maintain  maximum  flexibility 
the  rules  performed  by  the  subroutines  should  be  stated  simply  and  directly 
in  terms  of  computer  instructions  and  that  the  determination  of  "belonging” 
to  classes  should  be  separated  from  the  rules* 


In  the  exaople  above,  the  Lvlicator  Subroutine  "Indicates"  that  a  word 
of  the  uxiit  does  or  does  not  beloi^  to  the  grovp  of  words  — >  give,  etc*  by 
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sinply  storlzig  a  "1"  or  "0"^  r^speetlvalyf  in  a  certain  position  of  oonputer 
word  in  the  memory  record*  Ihe  "1**  and  ‘'O'*  are  known  as  Indleator  oodss*  the 
computer  word  In  the  memory  record  is  known  as  indicator  code  word*  There 
are  9$  indicator  codes*  The  groups  whidi  can  be  classified  by  means  of 
indicator  codes  are  varied*  Among  them  are  the  following t 

1)  all  words  idiloh  are  either  modal  verbs^  copulative  verbs,  verbs 
idiieh  are  forms  of  to  be  or  to  have* 

2)  all  words  which  are  either  possessive  adjectives,  or  ambiguous 
adject! ve/pronouns,  or  ambiguous  nouns* 

3)  all  words  which  are  either  verbs  or  prepositions  or  conjunctions  or 
words  which  end  in  -Ing* 

The  control  routine  calls  upon  the  Indicator  Subroutine  at  several 
different  times*  By  so  doing,  the  information  recorded  in  the  indicator  code 
words  of  the  memory  records  is  maintained  in  its  most  precise  form  for  use 
by  the  subroutines*  Instead  of  testing  a  word  against  a  long  list,  for 
example,  to  determine  whether  or  not  a  rule  is  suitable  for  resolving  a  word*s 
ambiguity,  the  svdbroutine  need  test  only  a  single  bit  position  of  an  indicator 
code  word* 

Other  Subroutines 

It  Is  by  means  of  the  subroutines  that  the  oonputer  resolves  the  anbigu* 
ous  words  of  the  text*  There  are  six  subroutines  ccrrespmding  to  the  six 
types  of  ambiguities  which  are  to  be  resolved*  Each  subroutine  is  made  rp 
of  a)  a  control  program,  b)  a  rule  table,  and  o)  coding  which  represents,  in 
eonputer  terms,  each  of  the  rules  in  the  set  for  resolving  the  speoifie 
ambiguity* 

The  control  program  is  standard  for  all  subroutines*  It  may  be  stated 
as  follows t 

1)  Advance  a  rule  counter  C* 

2)  Locate  Ihom  a  rule  table  the  starting  address  of  the  coding « 
corresponding  to  Rule  C* 


3)  Transfer  to  the  coding  for  Rule  C« 

The  coding  for  Rule  C  either  resolves  the  ambiguity  or  returns  to 
step  1)  above*  If  the  anbigaity  is  resolved,  the  resolved  part  of  speech 
and  the  resolving  rule  number  C  is  sv^lied  to  the  specified  portion  of  the 
text  word  in  memory*  The  counter  is  reset  to  ^  and  the  subroutine  returns 
to  the  mein  Control  Routine* 

The  rule  table  consists  of  a  list  of  symbolic  addresses  corresponding 
to  the  entry  points  to  coding  for  eadi  rule  in  the  set*  The  subroutine 
executes  the  rules  by  transferring  in  sequence  to  the  syi!d>olio  addresses  in 
this  table* 

Ihe  use  of  a  rule  table  adds  another  feature  of  flexibility  to  the 
system*  By  rearranging  the  sequence  of  the  symbolic  addresses  in  the  table 
it  is  possible  to  rearrange  the  order  in  which  the  rules  of  the  set  are 
applied*  It  is  further  possible  to  eliminate  the  application  of  certain 
rules  by  sliqily  omitting  their  syiiA>ollc  addresses  from  the  table*  Rules  may 
also  be  added  by  supplying  tte  necessary  coding  for  the  rule,  assigning  it  a 
symbolic  address,  and  inserting  this  address  In  the  desired  place  in  the 
table*  These  changes  are  made  by  reassembling  the  program*  The  rule  nusber 
is  not  actually  attached  to  any  piece  of  coding,  but  corresponds  to  the  order 
of  rule  application* 

The  coding  for  a  rule  in  the  set  may  require  the  examination  of  the 
indicator  codes,  the  fonn  classes,  and  other  oharaoterlstlcs  of  the  asblguous 
word  or  of  the  words  which  precede  or  follow  it*  When  axamination  of  words 
other  than  the  ambiguous  word  is  reqxiired,  the  sidmroutine  gains  access  to 
these  words  by  means  of  special  "search”  routines* 

Ihe  search  routines  were  developed  to  meet  the  common  need  of  all  the 
subroutines*  There  are  eig^t  search  routines t  four  to  locate  words  to  the 
left  of  the  given  word  and  four  to  locate  words  to  the  right  of  this  word* 


They  differ  In  thetr  mannei*  of  searbhlng*  One  of  the  routines,  for  exanqole, 
locates  the  word  preceding  the  given  word  ignoring  all  "adverbs  /  prepositions" 
irtiloh  precede  It* 

The  flow  chart  of  a  typical  rule  In  Appendix  E  Illustrates  the  coeqputer 
technique  for  applying  one  of  the  linguistic  rules  for  resolving  noun/verb 
present  tense  ambiguities* 

Clause  Resolution 

Although  there  has  been  no  progranming  In  progress  for  this  part  of  the 
system,  the  general  procedure  Is  shown  In  the  block  diagram  In  Block  U, 
Appendix  C« 


TPENPn  A  —  Card  Formts 


FOBKiT  # 

FORMAT  NAME 

COLUMNS 

FIEID  NAME 

PARAMETERS 

1 

Dictioxiary 

1-15 

27 

35-3? 

ii3-67 

Diotlonazy  Code 

Special  Code 

I^tionary  Order  # 
Alphabetic  Word 

2 

Text 

1-10 

li>72 

Identification  # 

Words  of  Text  (Described  in  Data  Preparation) 

3 

General  BCD 

Update 

Specif icatione 

1 

2 

3-6 

9-12 

15-18 

19 

BCD  Identification 

Dictionary  Update  Code 
Header  Block  Size 

Data  Block  Size 

Logical  Record  Size 
General  l^pdate  Code 

"X" 

<D-yes 

71"by  Data  Blooloi* 
\2"by  Record# 

^3*by  Word  in  Block 
Word  in  Record 

h 

Detail  BCD 
Update 

Specilicatlone 

1 

Modification  Code 

(A-Add 
cC>Change 
/  D>Delete 

VS-Search 

2 

Secondary  Output 

(S^yt 

3 

Starting  Block  or 
Record  ("D"  Cards) 

^O-saine 

^N*next 

5-6 

Nuiober  of  Blocks  or 
Records  ("A”  and  "D" 
Cards) 

01... 10 

7-12 

(  Block  or  Record# 

or  ^  Starting  Word  in  Block 
or  Record  of  Match  Fieli 

13  (  0 

or^  Startix]g  Digit  of 
^  Hatch  Field 

17-18  f  fi 

or  >  Length  of  Match 
Field  (Digit®  )436 

19-5U  \  0 

«=■  I  Match  Field 

26... 56-30... 60U 

or  ^Starting  Word  in  Block 
/or  Record  of  Change  Field 


TORMILT  #  FOBMAT  NAME  COLUMNS 


nELD  NAME 


PARAMETERS 


5 


6 


25... 55 

'0 

or-^  Starting  Digit  of 
^Chaut^e  Field 

31... 61-36... 66  y 

or  *^ngth  of  Change 
fTield  in  Dibits 

37... 67-78 

r0 

V Change  Field  (Change 
or  \Fleld  May  Continue  in 
^ext  Cards  Cols.  1-78) 

Oeoeral  1 

Binary  General  "X" 

Binary  Update 

Ideatifloatlon 

Speolfi cation 

3-6 

Header  Block  Size 

7-12 

Other  Block  Size 

Detail  Blnaiy  1 

Blnaiy  Detail  "Z" 

Update 

Specification 

Identification 

>6 

Block  #  of  Change 

7-12 

Starting  Wd  In  Block  for  Change 

17-18 

Starting  Bit  in  Word  for  Change 
(l->r-l-36) 

23-21* 

Length  of  Change  Field  in  Bits^ 

25-60 

Change 

N«B«  All  card  ools*  not  spaolfied 


above  ■  0*  all  padding  ■  0 


APPaiDIX  B 


FILE  #  FILE  NAME  CHARACTERISTICS  R?.CORD  FORMAT#  WORD  DIGIT 


1  Dictionary  Blook"500  Wd«  1 

Reoord«20  Wd* 
Header/l^ailer 
Saquenoat 
l*Alphabetlo  Wd* 


1 

5 

5 

6 
8 

10 


2  1-: 

5  1-2 

9  >/ 

10  6  ) 


2  Taxt  Blook«500  Wd 

Reoord«10Ud 

Haadar/Traller 

Saquencaat 

1  Text  Identification 
Wd  6,  dig  l>6j 
Wd  7,  dig  3-U 


1  1-^ 

5  2 

5  3-6 


2  Text  Identification 

3  Alphabetic  Word 


6  1.^ 

7  0 


N«B,  0  Padding  fiUa  all  unnamed  digit  positions  of  records* 


FIELD  NAME 


Alphabetic  Wd* 
C0  Padding) 

Dictionary 
Order  # 

Dictionary  Code 


Special  Coda 


n:iar  Alphabetic 
Qroup  Code 
Suooaeding 
Alphabatio 
Group  Coda 
Prior  Alphabatio 
Qroup  Record 
Count 


P  »  Post 
Punctuation 
Pracadlng 
PBnotuatioa 

Taxt 

Idantifloatlon 


in  B  (Ooat*d) 


RECORD 

nm  NAME  CHARACTERISTICS  PORKdH 

Cl»ng«  BIoeic*llt  Wd  1 

RMord«lii  Wd 
HMd«r/Trall«F 
Saqueno*! 

Record  1 
2 


or  Record  3 

k 


2 


WCRD  DIGIT  FIELD 


f\l\ 


PJtfUMETaS 


1  1  BCD  Identification  -  "X" 

X  2  Dictlonaiy  3  D^e 

U^pdate  Cede 

1  3-6  Header  Block  Slae 

2  3-6  Data  Block  Slie 

(l-t^  data 

3  1  Oeneral  )  block  # 

I^ate  Code  N2>4qr 

J  record  # 
/  3*b3r  word 
{  in  block 
word 

in  record 
(A-Add 

1  1  ModiTioation  J  C-Change 

Code  /MMote 

C  S-Searoh 

1  2  Secondary  Output  j  S-yaa 

1  3  Starting  Block  '  0-aaM 

or  Record  ("D"  '^N-naoct 
Carda) 

1  5-6  Niaber  of  Blooka 

or  Reoorda  ("A"  01. ••10 

and  "D”  Carda) 

2  1-6  Blodc  or  Raoord  # 

or  Startii^  Word  in 

Block  or  Raoord  of 
Hatch  Fiald 

3  1  0 

or  Starting  Digit  of 

Matoh  Fitld 

3  5-6^ 

or  Length  of  Matoh 

Field  (Dliilta)636 

U  !-?>  0 

;or  Match  Field 


RECORD 

FIIE#  Fia  NAME  CHARACTERISTICS  FORMAT#  WORD  DIGIT  FIELD  NAME  PARAMETERS 


5... 10 

2-6  (U 

j^Startlitg  Word  in 
)Blook  or  Record  of 

V.$hange  Field 

5...10 

1  ^ 

or-otarting  Digit  of 

Aaianje  Field 

6«««11 

1-6  {0 

oovLength  of  Change 
^'iPield  in  Digits 

7... 12 

1-)  {0 

QpjChange  Field  (Change 

13 

6l  )  Field  Magr  Continue  in 
y  (Next  Caz^  ool8.1-78) 

3  1 

1  Binary  Oeneral  "X" 

Identlfloation 

1 

3-6  Header  Blook  Sise 

2 

1-6  Other  Block  Sise 

It  1 

1  Binary  Detail  "Z" 

Identlfloation 

1 

3-6  Blook  #  of  Chai^ 

2 

1-6  Starting  Wd  in  Blook 

For  Cha^e 

3 

5-6  Starting  Bit  in  Word 
for  Change  (l-^«l-36) 

k 

5*^  Length  of  Chaxige 

Field  in  Bit8*36 

5 

^  Change 

10 

^3 

1 

It  Afipmdtd 

Blook^OO  Wd  1 

. <  C  Alphabotio  Word 

T«xt 

R«eoird«20  V/d  5 

HMdtr/Qrallcr 

S«q^«io«i  5 

1  Alptiabetlo  Wd. 

2  P^t  Post  Punctuation 

5 

2  T«xt  Idantiflcatioa 

5  Special  Code 

5 

6  Capital  Letter  Cods 

Mppmaa  b  (cont«d) 


RECORD 

Fnap  yilg  HAME  CHARACTHlISnCS  FORMAT#  WORD  DBIT  FIEID  NMffi  PARAMETBR8 


6 

7 

7 

8 
9 

U 

12 


12 

13 

13 

13 

20 


1. ) 

(f  Text  Identlfloetlon 

u  s 

$-6  Final  Affix  Code 
1-2  (Final  -1}  Affix  Code 


le» 

6 

1 


Alphabetic  Word 
Stem 


NovuV^onoun 

Code 


.l>4ioun 

<24ronoun 

/O^lther 


2 


3 

k 

5-6 


Verb/ConJ* 

Code 


Adj/lPr^ 

Code 


A(hr»  Code 


(l«Iioun 

y2eft‘OIIOun 

^O-Nelther 


( l^oun 

^^FFonoun 

^Neither 

(iHUir. 


NouV^oa.Subolaea 


1-2  Verb/ComJ*  Subolase 
3  Adj/Prep  Subolase 


U  Adv*  Subolaes 


1-5  Dictionary  Code 

(First  fits  Dibits) 


Error 

61oolc-20W<U 
Reoord<d.O  Wd« 
Header/Trailer 

1 

■  Record  Format  File  #2 

Resolved 

Blook-500  Wd 

1 

All  Fields  Are  Identical  to 

Text  (N) 

Record"20  Wd 

Header/Trailer 

Sequence: 

Record  Format  1,  File  b 

Additional  Fields  Are 

N 

1  Text  Identlfloetlon  15  1^  V 

>  Indicator  Codes 

17  6  3 

18  1-6  Resolved  Part  of  Speeoh 

19  1-6  Resolving  Rule  # 

CHmCTERISTICS 


WCRD  nZaiT  FIEU) 


PARAMETERS 


APPEHDIX  C 


BLOCK  1 


BLOCK 


/  N 

f  APPEND 

,  TEXTl  I 
V  PILE  iik! 


1.  TEXT 
ID 


3 


/RESOLVEDX  t->print 
TEXT  (1) 

FIIE  #6  /  ^  EDIT 


TEXT  u; 

FIIE  f'  El 

\ 


S«2 
X" — 

^APPEND  \ 
j  TEXT*  j 

) 


WORD 

RESOLUTION 

REPORT 


APPEND 
TEXT* 
FILE  #U/ 


AMBIODOUS  /^OLm 
WORD  4  TEXT  (1) 
\  FIIE  ¥>  I 


CLAUSE 

RESOLUTION 

REPORT 


APPENDIX  D  Control  Flow  Chart 


I  ■  symbol  for  action  X 


Appaipn  B 


Typical  Rule  noun/verb  present  tanea 


Rule  C 


Is  the  anblguous  word  preceded  (Ignoring  adv/  prep  and  adj/pronouns)  by  a 
preposition?  If  yes  assign  as  a  noun*  If  no^  or  if  no  word  preoedeS|  apply  next 
sequential  rule* 


IbTORE  C  IN  RUIE  # 
AND  "NOUN"  IN  PART 
OF  SPEECH  FCR 


mainX 

CONTROL  ] 


{AWia.WD* 


V' 


*  L3  is  a  search  routine  irtiioh  locates  the  first  word  (whldi  does  not  beloi«  to 
the  class  "advi^ep  and  adj/pron")  preceding  the  giren  one*  If  a  oosna  or  no  word 
is  found  to  precede  the  given  word,  the  seardi  routine  returns  to  BBQIN*  Otherwise, 
the  mainory  address  of  the  word  found  is  made  available  to  the  subroutine  for 
testing  purposes* 
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(X)  •  poaaibi/  elaaa  I 
(•d)  ■  put  part*  u  verb  fora 
Count*  ■  countable 
nro/AJ  ■  exactly  Pr»^AJ  anbiguoua 
L*D*P*  ■  Limited  diet*  participle 
U*D*  ■  unuaual  diatribution 
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